Diagnosis Predictive Model For Classification Problem

Exploratory Data Anaylsis

Distribution of Label

We can see above we have 357 values belongs to B and 212 values for M.

Pairplot of All Features

From above graph, we can see the relationship between each features using pairplot and most features are normal distibuted as we can see a histogram.

Jointplot Between 2 Features

We plot the jointplot between texture mean and radius mean features against label column diagnosis in which we can see more values belongs to B label.

Importing Libraries

Label Encoding

Extract Dependent and Independent Variables

Split Train Test

Model building - Random Forest

We choose the random forest model for classification because the random forest can generalize over the data in a better way. This randomized feature selection makes random forest much more accurate than a decision tree.